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This listing of claims will replace all prior versions, and listings, of claims in the application. 
Listing of Claims: 

1 . (currently amended) A method for detecting similar objects in a collection of such 
objects, comprising, for each of two objects: 

modifying a previous method for detecting similar objects so that memory 
requirements are reduced while avoiding false detections approximately as well as in the 
previous method, wherein the modifying comprises: 

combining a number of four samples of features into seven each of a total 
number of supersamples , wherein the number of samples is reduced from a number of 
samples used in the previous method ; 

compressing each of the total number of supersamples to a number of bits of 
precision, wherein the number of bits of precision is reduced from a number of bits of 
precision used in the previous metho d, and wherein the number of bits of precision is reduced 
by generating supersamples that do not include at least one least significant bit of the 
supersamples used in the previous method ; and 

requiring a number of matching supersamples out of the total number of 
supersamples in order to conclude that the two objects are sufficiently similar, wherein the 
number of matching supersamples is greater than a number of matching supersamples 
required in the previous method. 

2. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all but one of the total number of supersamples to match. 

3. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all but two of the total number of supersamples to match. 

4. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all supersamples to match. 



5. (cancelled) 
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6. (previously presented) The method of claim 5 wherein: 

compressing each supersample to the first number of bits of precision comprises 
recording each supersample to 16 bits of precision, wherein the second number of bits of 

precision used in the previous method is 64; and 

requiring the number of matching supersamples comprises requiring four 
supersamples of six to match, wherein the number of matching supersamples required in the 
previous method is two supersamples of six. 

7. (original) The method of claim 5 wherein requiring the number of matching 
supersamples comprises requiring five supersamples of seven to match , wherein the number 
of matching supersamples required in the previous method is two supersamples of six. 

8. (original) The method of claim 1 wherein the objects are documents, and the 
method is used in association with a search engine query service to determine clusters of 
query results that are near-duplicate documents. 

9. (original) The method of claim 8, further comprising selecting a single document 
in each cluster to report. 

10. (original) The method of claim 9 wherein selecting the single document is by 
way of a ranking function. 

11-13. (cancelled) 

14. (currently amended) A method for determining groups of near-duplicate items in a 
search engine query result, comprising, for each of two items being compared: 

combining four samples of features into each of seven supersamples; 

compressing each supersample to 16 bits of precision by recording the 16 most 
significant bits of the supersample ; and 

requiring five of the seven supersamples to match. 
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15. (original) The method of claim 14, further comprising selecting a single 
document in each cluster to report. 

16. (original) The method of claim 15 wherein selecting the single document is by 
way of a ranking function. 

17. (currently amended) A computer-readable medium embodying machine instmctions 
implementing a current method for detecting similar objects in a collection of such objects, 
wherein the current method comprises modification of a previous method for detecting 
similar objects so that memory requirements are reduced while avoiding false detections 
approximately as well as in the previous method, the current method comprising: 

combining a number of four samples of features into each of seven a total 
number of supersamples , wherein the number of samples is reduced from a number of 
samples used in the previous method ; 

compressing each of the total number of supersamples to a number of bits of 
precision, wherein the number of bits of precision is reduced from a number of bits of 
precision used in the previous method , wherein tho number of bits of precision is reduced by 
generating supersamples that do not include at least one least significant bit used in the 
previous method ; and 

requiring a number of matching supersamples in order to conclude that the two 
objects are sufficiently similar, wherein the number of matching supersamples is greater than 
a number of matching supersamples required in the previous method. 

18. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all but one of the total number of 
supersamples to match. 

19. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all but two of the total number of 
supersamples to match. 



Page 4 of 8 



DOCKET NO.: 307238.01 /MSFT-5031 PATENT 

Application No.: 10/805,805 

Office Action Dated: August 10, 2007 

20. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all supersamples to match. 

21. (cancelled) 

22. (currently amended) A computer-readable medium embodying machine instructions 
implementing a method for determining groups of near-duplicate items in a search engine 
query result, comprising, for each of two items being compared: 

combining four samples of features into each of seven supersamples; 
compressing each supersample to 16 bits of precision by recording the 16 most 
significant bits of the supersample ; and 

requiring five of the seven supersamples to match. 
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